Combination of words and word categories in varigram histories
نویسنده
چکیده
This paper presents a new kmd of language models: caregor@vord varigrums. This special model type permits a tight integration of word-based and category-based modeling of word sequences. Any succession of words and word categones may be employed to descnbe a given word history. This provides a much greater flexibtlity than previous combinations of word-based and category-based language models. Expenments on the WSJO corpus and the 1994 ARPA evaluation data indicate that the category/word vangram yields a perplexity reduction of up to 10 percent as compared to a word vangram of the same size. and improves the word error rate (WER) by 7 percent. Compared to a linear interpolation of a word-based and a category-based n-gram, the WER improvement is about 4 percent.
منابع مشابه
Compact n-gram models by incremental growing and clustering of histories
This work concerns building n-gram language models that are suitable for large vocabulary speech recognition in devices that have a restricted amount of memory and space available. Our target language is Finnish, and in order to evade the problems of its rich morphology, we use sub-word units, morphs, as model units instead of the words. In the proposed model we apply incremental growing and cl...
متن کاملCompact n-gram models by incremental g
This work concerns building n-gram language models that are suitable for large vocabulary speech recognition in devices that have a restricted amount of memory and space available. Our target language is Finnish, and in order to evade the problems of its rich morphology, we use sub-word units, morphs, as model units instead of the words. In the proposed model we apply incremental growing and cl...
متن کاملAdaptive topic - dependent language modelling using word - based varigrams
This paper presents two extensions of the standard interpolated word trigram and cache model, namely the extension of the trigram model by useful word m{grams with m > 3 resulting into a varigram model , and the addition of topic{speciic trigram models. We give the criteria for selecting useful m{grams and for partitioning the training corpus into topic{ speciic subcorpora. We apply both extens...
متن کاملAdaptive Topic { Dependent Language
This paper presents two extensions of the standard interpolated word trigram and cache model, namely the extension of the trigram model by useful word m{grams with m > 3 resulting into a varigram model , and the addition of topic{speciic trigram models. We give the criteria for selecting useful m{grams and for partitioning the training corpus into topic{ speciic subcorpora. We apply both extens...
متن کاملEnglish Vocabulary for Equine Veterans: How Different from GSL and AWL Words
ESP students are usually suggested to master general and academic word lists such as Wests’ (1953) General Service List (GSL) and Coxhead’s (2000) Academic Word List (AWL) to be able to read their academic texts. However, it seems that university students may not need to learn all the words in the two lists as some words in the lists are of less frequency in academic texts. Moreover, there are ...
متن کامل